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January 19, 2020 | TRENDS FROM THE TRENCHES—Dan Stanzione has been 
working at the Texas Advanced Computing Center in Austin for three quarters 
of its 19-year tenure. Now the executive director and associate vice president 
of research, Stanzione runs TACC as both the research computing arm of the 
University of Texas as well as an independent National Science Foundation- 
funded supercomputing center. 


TACC houses a host of computing resources—all with Texas-themed names. Frontera is a Dell 
C6420 system ranked eighth on the June 2020 Top500 list of supercomputers. Its 23.5 HPL 
petaflops is achieved with 448,448 Intel Xeon cores. Stampedez2 is the flagship supercomputer of 
the Extreme Science and Engineering Discovery Environment (XSEDE) and achieves 18 petaflops 
of peak performance with 4,200 Knights Landing (KNL) nodes—the second generation of 
processors based on Intel's Many Integrated Core (MIC) architecture—and 1,736 Intel Xeon 
Skylake nodes. Then there’s the Lonestar5 (1252 Cray XC40 compute nodes), the Longhorn (108 
IBM Power System AC922 nodes), the Maverick2 (NVIDIA GPUs) and many more. 


With so much big compute to wrangle, Stanzione and TACC are adept at provisioning 

systems, balancing loads, and predicting trends in scientific computing. As part of the Trends from 
the Trenches column, Stan Gloss sat down with Stanzione to talk about big compute, pandemics, 
and technology trends. 


Editor’s Note: Trends from the Trenches is a regular column from the BioTeam, offering a peek 
behind the curtain of some of their most interesting case studies and projects at the intersection of 
science and technology. 


Stan Gloss: What percentage of the research that you do at TACC is actually supporting life 
sciences, and how's that changed over time? 


Dan Stanzione: There's been a surge in research associated with COVID-19; that's sort of skewed 
the numbers for the last few months. Starting in early March through about the end of June, [our 
workload has] gone to about 30% COVID-19 support, both technical staff and computing cycles. I 
think we have 45 projects that we're supporting, some of which are very large collaborations with 
huge teams around the world. I think just one of them involves 600 researchers. 


I would say prior to COVID-19, we were probably running 15% or 20% life sciences across the 
center. Our computing time has been particularly [focused] on the molecular dynamics and protein 
structure side of life sciences. Relating to COVID, [the work] has been more on the data science, 
epidemiology, data integration pieces of it. [That work] doesn't use the [compute] cycles the way 
the protein structure stuff does, but it certainly uses the people time and the software. 


During normal times, how much of your work is committed to UT researchers? 


Our big machines are supported by NSF, federally supported. I have about 10% of the cycles that 
we keep at home for UT folks, and the rest go around the world. 90% of our users are not UT 
Austin. 


If I was working as an NSF-funded program, could I get access to TACC? How does one get 
time? 


There're several mechanisms and programs that we use. For Stampede2 and Wrangler, and a few of 
our other platforms, there's a shared services group at NSF that allocates time among the various 
supercomputing centers, the project is called XSEDE. They have a quarterly allocations request. 
That's for the larger things. 


If you are at a university and you're doing open research, you can just apply for a start-up allocation 
—we turn that around in-house in about a week—and get on the machine. But as your usage grows 
to many thousands of hours, you can apply quarterly to XSEDE. You write a proposal that shows 
you know what your science is and why you need the time—justify it. It's a competitive process. 
There's no cost to do it, you just apply for time. 


For our very largest machine, Frontera, it's in a separate track at NSF and we have a separate 
allocations proposal where we also take quarterly requests for the very largest projects to go on the 
machine. Again, it's proposal-driven and peer-reviewed once per quarter to get onto the machine. 


You don't have to be NSF-supported, although that's the bulk of our users and you get preference if 
you have support from NSF, but probably 10% 10 or 15% of the cycles we allocate are the NIH- 
supported researchers, also DOE or the USDA. It can be any funding source that's not classified, 
mostly academic. We will take industry users for open, publishable research through the NSF 
process, or we can always do for cash. We have a chargeback mechanism to get access if you're 
doing something that's not open or you're not getting enough time through the publicly funded ways 
to get time. 


You mentioned that Frontera had a special allocation that's different than Stampede and the 
others. Tell me about Frontera and why is that different? 


Frontera is the second in NSF's series of leadership class systems. Frontera was number five in the 
world when it debuted about a year ago. It's still in the top 10 in the world, I think, on the new list 
two weeks ago, at the time we're recording this. It's still one of the 10 biggest machines anywhere in 
the world. It's the biggest university-based resource anywhere in the world, certainly in the United 
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States. There are a few large government machines in China and the US that are larger, but we're 
sort of the largest, truly open academic machines out there. 


We followed another big NSF machine, Blue Waters, that was funded in a similar track. And there's 
a lot of people who want time on these machines, and there's a lot of tension between having the 
number of people who want access for all the different computational problems that are out there, 
and the problems that need a whole lot of time to make progress—that can use a third of the 
machine for two or three months to solve a single problem or they can't do anything. 


We separated the XSEDE machines dealing with the capacity problem. Having the thousands of 
users that we have to support, we reserve Frontera for the capability problem, the few users who 
need a whole lot of time. Stampede2, which is also still in the top 25 in the world, has 10 times the 
users and projects, but each individual user’s share is much smaller. We have literally 3,000 projects 
on Stampede2 and we keep Frontera, at any given time, around 50 or 60. As you might imagine, the 
average project gets a lot more computing time. We're really reserving the largest single challenges 
to run on Frontera, and with Stampede2, we're trying to promote broad-access to high-performance 
computing. 


How do you manage 3,000 simultaneous projects? That sounds pretty daunting. 


I'll give you the short answer, but yes. Although there are moves away from this in some segments 
of the enterprise, most of this is still batch-scheduled. That's the notion of the allocation process: 
everybody gets a fixed amount of time. We have an accounting system that deducts that time as they 
submit to a queue for jobs, so you just sort of run them in the order they show up. [We consider] 
different things about priority and fairness and scheduling and prioritizing big jobs so little jobs 
don't starve them out. But essentially, we have hundreds of users log in every day and submit 
thousands of jobs every day. And we just queue them up and run them. Keep the machine busy, 365 
days a year doing this stuff and crank through. Both machines do over a million jobs a year at this 
point. 


Do you have many of those types of clients that are a commercial client who want to pay for 
some time? 


We have a fair number. People directly using time on the systems tend to be the smaller and mid- 
sized companies across several industries. We have a couple of aerospace companies; we have a few 
oil and gas companies who do production computing with us. 


We also have partnerships with a lot of the large industry customers to do benchmarking. They want 
access to our systems to test things out, but it's more about learning from us to build their own in- 
house infrastructure. So they do a limited amount of computing, but they're asking us to test codes. 
Or they come to trainings; they come to our annual industrial partners meetings to exchange best 
practices. Given that we're in Texas, we have most of the large oil and gas companies participate 
through those meetings. Altogether we work with probably 40 or 50 companies. 


So there's an educational component to what you do? 


Oh, absolutely. Our job is to figure out how to use advanced computing technologies to create 
scientific engineering and societal outcomes? That means not just buying and deploying the 
systems, but operating them and training people to use them. I think our staff is more valuable than 
our machines, quite frankly. Computers are relatively easy to get. Computers run by professionals 
who know science and build the software stack and continual stuff to run on: that's the scarce 
commodity. 


In addition to the 30,000-ish or so servers that we run across the various different large computing 
platforms that we have, we have about 170 staff who take care of these things ranging from life 
science experts to astronomers and chemists and computer science experts, machine learning 
experts, data curation expert. Increasingly, the workflow in science is you bring together a bunch of 
data from a bunch of sources. We have to clean it and integrate it and do a fair amount of pre- 
processing on that. You're probably going to do both simulation and some form of AI somewhere in 
the workflow. Then you're going to need to understand that data with visualization or some other 
technique for data analysis. Finally, you're going to want to publish and reproduce those results over 
time. So we try and be part of that whole computational science workflow in what we do. 


That's incredible. How do you forecast where supercomputing is going to be two to three 
years? 


Yeah, I'm already designing machines for 2024. These are tens of millions of dollars procurement, 
so you don't want to buy old technologies. Computer technologies have a pretty short lifespan. 
Usually we're trying to make a decision about two years before technology comes to market. For 
Frontera, it's a proposal-driven process, and there was a competition for submitting proposals to the 
government for decisions. To some extent, we pick a technology and a vendor team to work with, 
and our competitors might pick different ones. And then the competition sorts them out as to who 
the winner is. We submitted the proposal for Frontera two full years before the start of production. 
We were extrapolating performance on chips that did not yet exist in close conjunction with our 
vendor partners. 


Is a vendor partner like Intel? 


Yes, Intel was our chip provider, although Frontera actually has an ecosystem. We have a GPU 
subsystem; we have a large memory subsystem. The primary compute is CPU-based Intel Cascade 
Lake Xeon. We were working way ahead of release with Intel. Fortunately, in that case, it was a 
fairly incremental change from the Skylake Xeons we had used on Stampede2 the year before, so 
we had some insight into what was going to happen. And it was sort of a linear extrapolation. But as 
you change technologies, that's not always true. 


We worked with Intel and Dell on the main part of the system. We actually have an IBM and Nvidia 
piece to the system as well, and then another single precision focus subsystem with Nvidia that's 
oil-cooled with Green Revolution Cooling. 


It is tough to stay abreast of these things. We can work with the chip manufacturers on what the 
roadmap is for technology, but we have to translate this into delivered science results. 


Right now, one thing we're evaluating pretty closely is are these chips to do tensor processing. It can 
optimize for precision, often down to 16 bit. We're trading off accuracy for speed in those situations, 
which in the case of during neural network makes a lot of sense, because you're just weighting the 
connections between neurons, essentially, for most of the computation. You really just need to 
know, is this one important or not important? 


We can sort of understand the chip design and how it works, but can we build a software ecosystem 
that will build applications on that? How much change do our users have to go through? Because 
again, we're supporting several thousand academic research teams; they don't all have large staffs of 
programmers to go in and make changes. We want to pull our users forward with what we think the 
best technologies are, but we can't get too far ahead of them, or they just won't use the machines. If 
it's a radical change and they have one grad student who's using some code they inherited, they 
can’t spend two years recoding it around a new technology. 


We put gradual pressure on them to change as the systems change, and then we have to work with 
the vendors to make sure we're not making too radical a change each time. This is why you see the 
very incremental rollout of technologies, like GPUs, where it's taken a decade and a half to really 
get penetration. It wasn’t because the chips weren’t ready, but because the software wasn't ready to 
use them. That's a huge problem. We have thousands of applications that we support that need to 
migrate to these new technologies. When we're looking at very different chips, we are scared that 
we might build something that our users don't want. 


Yes. For example, in life sciences there was a nice promotion of Hadoop, but none of the 
scientists wanted to modify their codes to take advantage of it. 


Yeah, and that Hadoop model is largely gone as a result. It was sort of a technology fad that came 
and went. Some of the codes, especially in things like weather, have been around for 20 or 30 years 
and they can't turn on a dime for a fad. 


You have ecosystem of high-performance computing technologies designed to do whatever the 
client needs. So if it maps very nicely to a GPU, you have GPUs available. Are you basically 
taking what your trend lines are for the types of uses that you're seeing on the systems and 
mapping the utilization of the different types of technologies based on that? 


Yep. There're really three sources of information we use to drive those sorts of decisions. First, we 
actually get users together and ask about forward-looking future challenges and how they see 
science changing. They tell you, "It's going to be more data-intensive," or, "We're going to have 
more uncertainty quantification." Or whatever it is looking forward. Then we get their aspirational 
goals and desires around that. Those aren't always necessarily assured to match with reality, 
although they are an important source of input. Second, we look at the allocations that our users are 
actually writing and look at the change in those over time. When push comes to shove, what are 
they really asking for? That gives you a slightly different snapshot on the reality, the present reality. 


Third, we look back with workload analysis over time of how the cycles are actually used on the 
machines that we have, and what runs where. Often that tells a somewhat different story. When we 
ask users, five years from now always looks dramatically different than today. If we look back at the 
workloads over the last 10 years, there are some changes, but they're remarkably consistent. Even 
though, in each of those 10 years, when we asked for a five-year vision, it was always going to be 
terribly different five years away. In truth, it stayed relatively constant. There's more of everything, 
but the mix is about the same: molecular dynamics versus astrophysics, how much finite element 
method versus how many FFTs are out there. We see some shifts, but they tend to not be that rapid. 
Al is potentially disruptive, but it's taking time to make its way into the workflows. 


We have to balance what they tell us they want, what they're actually willing to spend their time on, 
and then what's actually run. Those are the three different sources we use, then we blend those 
together to figure out how that maps to what we're going to see out there and what information we 
actually get from the vendors about future technologies. 


Can you tell me a little bit more about how the pandemic has impacted TACC? 


Changes will be wrought throughout society, but obviously we had operational changes. We've been 
fortunate that on our staff, a couple of people have been infected, but by and large, we've been 
healthy. And because we're all tech workers to some extent, we have a few people who have to go in 
and have to lay hands on the hardware and do stuff on keeping machines running, but by and large, 
it's pretty smooth for us to switch to telecommuting. 


The biggest impact is that we've had to divert a lot of our resources into actually tackling this 
pandemic. As I mentioned earlier, for several months, it was running as high as 30% of all Frontera 
and Stampede cycles were going into COVID work. 


We work with thousands of scientists. There are many we know very well, and we've had 
relationships with for a long time, and they know our systems and our platforms pretty well. And 
I've been able to say, "Yeah, let's just skip the process. We know you do good work. We know this is 
a priority right now. Let's just get going.” 


We've been able to do some pretty miraculous things in fast response, but that’s not an accident. We 
rehearsed for this doing work for SARS and work for HIV, and work for H1N1 and H1N5 and 
MERS. 


There're people who dedicate their careers to this stuff, and we've been dedicated to supporting 
them. We have the infrastructure and the people and the software tools in place, and that allows us 
to respond quickly when there actually is a disaster. But we couldn't start from zero and do as much 
as we've done in this short a time without the relationships and the infrastructure in place to do this. 


When you do that, what happens to all the other work? 


It just gets put a little bit on the back burner, right? People have to wait longer to get their stuff 
through and they have less time available. We're actually planning to deploy an expansion; we're 
going to add a few hundred nodes to offset some of the time that we've lost. We'll make it up over 


several years. But we'll add some capacity because of the time we've diverted away and anticipate 
that we'll continue to divert away to look at these COVID things. 


We've diverted time, I'd say, in three big categories of work. One is at the atomic level, 
understanding the structure of the viron itself, understanding the structure of the cells and the drugs 
that we might wrap around it, and doing a whole bunch of protein folding and structural work with 
the light sources and cryoEM folks to get data to confirm that stuff. It's traditional simulation. At the 
other extreme, we work on the whole person, which is the epidemiology, right? How does the virus 
spread? Doing contact tracing. Looking at cell phone data to see how interaction patterns are 
changing, and how social distancing actually reduced the number of people that you see when you 
put a regulation into effect. You can do a whole bunch of data science around mostly cell phone data 
to figure out things we couldn't do even 20 years ago. We can model how you position your 
resources, doing models to look at hospitalization rates and how many ICU beds we need to have 
available, which affects public health policy. 


Finally, in between those two is the genomic level stuff, which is coupled to and informed by the 
molecular work. Can we find in the RNA sequence for the virus, similarities to other viruses and 
sort of treatments that are effective? Can we understand its evolution? To figure out prospective 
treatments, can we understand the hosts that it's infecting? Can we say, "These sequences tend to 
mean you're more vulnerable or less vulnerable. This part of this sequence forms these proteins, 
which makes a certain segment of the population not as vulnerable,” Can we translate those? Both 
the molecular part and the genomics part actually affect therapeutics and drugs or vaccine work. 


Looking into your crystal ball, in the next three to five years how is technology and computing 
technology going to change? And what are you designing for three to five years from now in 
your next generation that we should be thinking about? 


Yep. There's a lot of layers to that question. We have to think about how computational science is 
going to change, and then how computing technology changes. And I think there's really exciting 
things going on at both levels. 


From the science perspective, the role that AI is going to play is going to continue to augment our 
science in some very interesting ways. It is getting much cheaper to acquire data from everything 
from massive sensors for both environmental and traffic analysis, 2 to 5G, and our ability to just get 
bits. Put very low power, very high data-rate accurate sensors out for very low cost. Fusing that data 
into the scientific workflow and using AI methods to get statistically valid ways to put that into the 
workflow is interesting. 


It's unfortunate that we're not getting a free ride out of the physics and getting higher performance 
anymore. But it does mean that it's an opportunity to be creative in terms of architecture. How do 
we use the 100 million transistors per square centimeter that we can get in any processor now? 
We're seeing sort of a plethora of these new architectures around AI chips in more GPU types, 
GPU-CPU hybrids that I think are very exciting. I think the thing that's going to help us most on 
performance is the tighter integration of memory. We can put so many transistors on a chip that we 
can do plenty of operations, we just can't get data to them fast enough. They'll start integrating 
memory into the silicon, or at least onto the package with the chip a lot more. That's going to give 


us some huge performance increases and power efficiency improvements. We’re also broadly 
switching to liquid cooling to allow these higher densities and higher power per socket. 


We're also getting better code efficiency out of that. It's raising our power per square foot, but it's 
also raising our efficiency even more than that. Data centers are going to have a lot less air moving 
around and a lot more liquid moving around the infrastructure that we have to build for them into 
these larger systems of much more tightly integrated chips with a lot more heterogeneity where we 
can be creative as architects and how to use it. 


Does storage technology have to change along with this? The way you store now may matter 
again. 


Not necessarily a different type of storage system at this point, but we're seeing different access 
patterns. For the traditional, big, 3-D simulations we've done, it was throughput of IO that mattered. 
We had these big transactions, mostly large, fairly regular, and can we just feed enough of them. 
And now, especially with graph algorithms and the AI methods, we're seeing very small IOs that are 
more frequent: small, random access. 


The good news is that's it hard to do on a spinning disk, but fairly easy to do with a solid-state 
storage device, which is what we're moving to anyway. A lot of our software around building file 
systems is organized around the notion of disks that are spinning around, and we have to go get 
things off these rotating platters. There's a lot of optimization we can do for solid state that we 
haven't done yet. 


This is one of those situations where it would be nice if we could get users to give up the whole 
notion of files and open and move to more of an object methodology, but that's not really people- 
friendly, so I don't think that's going to happen. It's going to happen in the system software layers, 
but not in the application layers, because the core leading applications aren't going to change fast 
enough. We used to have 100 big files and now we have 10 billion little files, so we're changing the 
way that we manage storage systems. We're moving to a lot more per-user volumes that are 
dynamic instead of having one, big shared file system. We're in that transition now, but I think we 
can build it on the storage blocks we have with solid state, and now with these non-volatile DIMMs, 
right? That'll be the top rung of hierarchy. What the exact breakdown of those is going to be, is hard 
to forecast at this point, but faster and more amenable of a random access storage is going to really 
help us keep the compute part saturated. 


The other piece of this is then also, you have a lot of collaborators who are now transferring 
more and more of their information to and from you. Where's network going to go? Because if 
the game is going to ramp up, the networking has to ramp up. 


Yeah, that's always been true. Just in and out of Frontera and Stampede and their archive systems, 
we move 10 or 15 terabytes a day, apiece. I think we're moving over a petabyte of data a month into 
TACC at this point. Again, I see sort of big sensor networks and 5G and stuff like that as really 
driving that stuff forward. We have been relatively successful at moving people into better protocols 
that let us get the wires closer to full on using technologies like Globus instead of things like HTTP 


to move data. That's been super helpful in extending the pipelines that we have, but we have 100 gig 
pipes now and we'll have 400 gig pipes in the near future to support this. We'll have it within two 
years, would be my guess. And we'll need it by then too. 


